Skip to content

DAOS-18785 object: handle resent RPC on DTX non-leader - b28#17989

Merged
gnailzenh merged 1 commit intorelease/2.8from
Nasf-Fan/DAOS-18785_b28
Apr 17, 2026
Merged

DAOS-18785 object: handle resent RPC on DTX non-leader - b28#17989
gnailzenh merged 1 commit intorelease/2.8from
Nasf-Fan/DAOS-18785_b28

Conversation

@Nasf-Fan
Copy link
Copy Markdown
Contributor

@Nasf-Fan Nasf-Fan commented Apr 13, 2026

Usually, most of resent RPCs will be detected and handled on DTX leader. But when DTX leader is switched, such as old DTX leader is dead/evicted, the DTX for some inflight IO maybe in 'prepared' status on a non-leader while related client resends the RPC to new DTX leader. Under such case, DTX-resync may has not handled such DTX in time. Then IO handler on the non-leader needs to check whether related DTX has ever been prepared or not: if yes, directly reply to the DTX leader to avoid misguiding lower layer logic as to generate confused error.

Add new test case for that.

Allow-unstable-test: true

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@github-actions
Copy link
Copy Markdown

Ticket title is 'test_ec_multiple_rank_failure failed during IOR: dfs_write(0x558292ef2000, 2048) failed (5): Input/output error'
Status is 'In Progress'
https://daosio.atlassian.net/browse/DAOS-18785

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-18785_b28 branch 3 times, most recently from 5ace005 to 1b23afb Compare April 14, 2026 02:29
Usually, most of resent RPCs will be detected and handled on DTX leader.
But when DTX leader is switched, such as old DTX leader is dead/evicted,
the DTX for some inflight IO maybe in 'prepared' status on a non-leader
while related client resends the RPC to new DTX leader. Under such case,
DTX-resync may has not handled such DTX in time. Then IO handler on the
non-leader needs to check whether related DTX has ever been prepared or
not: if yes, directly reply to the DTX leader to avoid misguiding lower
layer logic as to generate confused error.

Add new test case for that.

Allow-unstable-test: true

Signed-off-by: Fan Yong <[email protected]>
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-18785_b28 branch from 1b23afb to 1d3a130 Compare April 14, 2026 14:30
@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17989/6/testReport/

@Nasf-Fan
Copy link
Copy Markdown
Contributor Author

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17989/6/testReport/

test_dfuse_daos_build_wb failed for DAOS-18813, not related with the patch.

@Nasf-Fan Nasf-Fan marked this pull request as ready for review April 17, 2026 02:22
@Nasf-Fan Nasf-Fan requested review from a team as code owners April 17, 2026 02:22
@Nasf-Fan Nasf-Fan added the clean-cherry-pick Cherry-pick from another branch that did not require additional edits label Apr 17, 2026
@gnailzenh gnailzenh merged commit ec1dac0 into release/2.8 Apr 17, 2026
38 of 40 checks passed
@gnailzenh gnailzenh deleted the Nasf-Fan/DAOS-18785_b28 branch April 17, 2026 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clean-cherry-pick Cherry-pick from another branch that did not require additional edits

Development

Successfully merging this pull request may close these issues.

5 participants